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Abstract 

Background: During the evolution of transposable elements, some processes, such as ancestral polymorphisms 
and horizontal transfer of sequences between species, can produce incongruences in phylogenies. We investigated 
the evolutionary history of the transposable elements Bari and 412 in the sequenced genomes of the Drosophila 
melanogaster group and in the sibling species D. melanogaster and D. simulans using traditional phylogenetic and 
network approaches. 

Results: Maximum likelihood (ML) phylogenetic analyses revealed incongruences and unresolved relationships for 
both the Bari and 412 elements. The DNA transposon Bari within the D. ananassae genome is more closely related 
to the element of the melanogaster complex than to the sequence in D. erecta, which is inconsistent with the 
species phylogeny. Divergence analysis and the comparison of the rate of synonymous substitutions per 
synonymous site of the Bari and host gene sequences explain the incongruence as an ancestral polymorphism that 
was inherited stochastically by the derived species. Unresolved relationships were observed in the ML phylogeny of 
both elements involving D. melanogaster, D. simulans and D. sechellia. A network approach was used to attempt to 
resolve these relationships. The resulting tree suggests recent transfers of both elements between D. melanogaster 
and D. simulans. The divergence values of the elements between these species support this conclusion. 

Conclusions: We showed that ancestral polymorphism and recent invasion of genomes due to introgression or 
horizontal transfer between species occurred during the evolutionary history of the Bari and 412 elements in the 
melanogaster group. These invasions likely occurred in Africa during the Pleistocene, before the worldwide 
expansion of D. melanogaster and D. simulans. 

Keywords: Transposable elements. Ancestral polymorphism, Horizontal transfer, Introgressive hybridization. Recent 
invasion, Drosophila melanogaster group 



Background 

Transposable elements are segments of repetitive DNA 
that can mobilize and propagate within host genomes. 
They have long been considered to be selfish DNA 
sequences because of the deleterious effects of their 
mobilization on the host genome. Recent advances in gen- 
ome analysis methods have revealed the significant contri- 
bution of transposable elements to genome evolution as 
sources of genomic novelty, as they can promote rearran- 
gements [1] and duplications [2] and can produce new 
regulatory sequences [3] that drive the changes necessary 
for genome evolution [4]. The emergence of transposable 
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elements in a genome can occur in three ways: de novo 
emergence, by the recombination of existing elements 
within genomes; horizontal transfer, by a vector; and intro- 
gression, by hybridization between two species (one with 
and one without a given element) [5,6]. The origin of a 
transposable element in a new genome by the last two 
processes may produce incongruences when the phyl- 
ogeny of the elements is compared to those of the species 
that harbor them. In addition, incongruence can also be 
produced when two or more variants in an ancestral 
lineage are stochastically inherited by the derived lineages. 
Horizontal transfer has been reported in several organisms 
(for a review see [7,8]), primarily between closely related 
species, given the requirement of shared time and space. 
In many cases, such species also share putative vectors 
(for a review see [9]); however, the occurrence of 
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phylogenetic incongruence due to the stochastic inherit- 
ance of ancestral polymorphisms, although potentially 
common and frequently given as an alternative hypothesis 
to horizontal transfers [8], is less often demonstrated in 
the literature. 

The genus Drosophila has been the focus of numerous 
studies involving transposable elements, and the afore- 
mentioned processes have been described in these species 
via bioinformatics analyses and analysis of natural popula- 
tions [8,9]; such studies have focused on species of the 
melanogaster group, especially the melanogaster subgroup. 
This subgroup comprises nine species {D, yakuha, D, teis- 
sierU D, santomea, D, erecta, D, orena, D, melanogaster, D, 
simulansy D, sechellia and D, mauritiana) that differ in 
many aspects, such as geographical distribution and food 
and host preference, but that diverged relatively recently. 
The subgroup is one of ten subgroups of melanogaster, 
eight of which are found in Asia and three in Africa {mela- 
nogaster, montium and ananassae); however, only the mel- 
anogaster subgroup is endemic to the Afrotropical region 
[10]. This subgroup is thought to have originated from a 
proto-melanogaster founder population that arrived in Af- 
rica 17-20 Mya from the Oriental region. This founder 
population gave rise to the evolutionary lineages that pro- 
duced the erecta supercomplex approximately 13-15 
Mya, the yakuba complex approximately 8-15 Mya and, 
more recently, the basal lineage of the melanogaster super- 
complex. Within this supercomplex, the most basal spe- 
cies, D, melanogaster, arose between 2 and 3 Mya; D, 
simulans, D, sechellia and D, mauritiana emerged very re- 
cently, no more than 0.5 Mya [11,12]. D, melanogaster 
and D, simulans are widespread due to very recent global 
colonization. Also widespread is D, ananassae; this species 
belongs to the ananassae subgroup, the basal clade in the 
melanogaster species group [13,14] (Species Phylogeny, 
Additional file 1 Figure SI). This species originated in 
southeast Asia and subsequently dispersed to other parts 
of the world, possibly through human activity [15]. The 
availability of the complete genomes of five species in the 
subgroup [16] enables the description of numerous trans- 
posable element transfers [9]. Meanwhile, the sequencing 
of just one strain s genome (in four of the five species) and 
the variable rates of genome coverage can prevent an ac- 
curate understanding of the evolutionary history of the 
elements in these species. One potentially important spe- 
cies is D, ananassae, whose genome is available but rarely 
included in studies. Its widespread distribution, from trop- 
ical to subtropical regions, and highly substructured popu- 
lations make D, ananassae a model for studies of genetic 
variation [17], such as the characterization of transposable 
elements. 

Among the transposable elements studied in the mela- 
nogaster subgroup regarding horizontal transmission or 
introgression are the DNA transposon Bari (transfer 



between D. melanogaster and D. simulans [18]), and the 
retrotransposon 412 (transfers between D, melanogaster, 
D, simulans and D, sechellia [19,20]). Bari, a DNA trans- 
poson belonging to the Tel-Mariner superfamily, is an 
ancient element in the evolutionary lineage of drosophi- 
lids that is widespread in both the Drosophila and 
Sophophora subgenera of the Drosophila genus, although 
it seems to have been lost in some species [21,22]. 
Within the genus, there are interspecific structural varia- 
tions in the terminal-inverted repeats (TIRs), the size of 
which would have changed over time [23]. Some var- 
iants, such as Bari2 (distributed in both Drosophila and 
Sophophora species) and BariS (described in D. willis- 
toni, D, pseudoobscura and D, mojavensis) harbor long 
TIRs, called LIRs (Long Inverted Repeats), which are ap- 
proximately 250 bp long. Others, such as Baril, which is 
present in the melanogaster complex only, contain short 
TIRs, called SIRs (Short Inverted Repeats), which are ap- 
proximately 26-bp long. These three variants, which 
share over 50% amino-acid similarity, characterize three 
subfamilies derived from a common Bari-e\ement ances- 
tor [23]. The element 412, a LTR (Long Terminal 
Repeats) retrotransposon that belongs to a Gypsy-like: 
superfamily, also seems to have appeared early in the 
evolution of the Drosophilidae family and to have been 
subsequently lost in some lineages [24] . In contrast to D, 
melanogaster, the genome of which contains only one 
472-subfamily element, D, simulans has two intrage- 
nomic variants that differ in the size of the 5 'LTR - 
UTR regulatory region. These two subfamilies arose 
from rearrangements and insertion-deletion events that 
produced new elements that may be capable of escaping 
host control [24]. 

Here, we studied the occurrence of both Bari and 412 
in the six sequenced genomes of the melanogaster group 
{D, ananassae, D, erecta, D, yakuba, D, melanogaster, D, 
simulans and D, sechellia). In addition, we compared the 
in silico results with those obtained from geographic 
strains of the sibling species D, melanogaster and D, 
simulans to uncover the evolutionary history of these 
transposable elements in the subgroup. The results will 
expand our understanding of the processes that shaped 
their evolution. We showed that at least two Bari var- 
iants were present in the ancestral lineage of the mela- 
nogaster group and were stochastically inherited, leading 
to incongruences between the phylogeny of the species 
and that of the transposable element. We also showed 
that the transfer of Bari and 412 elements between 
D, simulans and D, melanogaster occurred before the 
worldwide dispersal of both species and involved only 
one sequence per element. Thus, ancestral polymorph- 
ism, losses and reintroductions can explain the evolu- 
tionary distribution of these elements in these species of 
the melanogaster species subgroup. 
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Results 

Using the deposited sequences and the reference 
sequences for the transposable elements Bari and 412, 
we searched in the sequenced genomes of the melanoga- 
ster group. Homologous full-length and fragmentary se- 
quences of both elements were found in all species (see 
Table SI in Additional File 2 and Table S5 in Additional 
File 3). The fragments were not included in the analyses 
because most of them contained large deletions and 
many nucleotide substitutions, which prevent the esti- 
mation of the Ks values and the corresponding time of 
divergence between the sequences. Therefore, only the 
full-length sequences, with both TIRs for Bari and both 
LTRs for 412, were used for the analyses. 

DNA transposon Bari 

The number of full-length Bari sequences varied by spe- 
cies: 11 were found in D, melanogasten two were found 
in both D, simulans and D, sechellia, seven in D, erecta 
and four in D, ananassae (see Table S2 in Additional File 
2). The only sequence found in D, yakuba was a 215 bp 
fragment resembling the Bari of D. erecta. The ML re- 
construction of evolutionary relationships among the 
full-length sequences is shown in Figure lA and 
Additional file 2 Figure S2. The sequences of D, ananas- 
sae and D, erecta are clustered in well-supported mono- 
phyletic clades. Also well-supported is the clade 
grouping the sequences of the melanogaster complex {D, 
melanogaster, D, simulans and D, sechellia), albeit in a 
polytomic branch; however, the Bari sequences of D. 
ananassae cluster more closely to those of the melano- 
gaster complex than to those of D. erecta, which is in- 
consistent with the species phylogeny (Additional File 
1). The K2p distances further contribute to this incon- 
gruence, with the sequences of the melanogaster com- 
plex and D, ananassae being less divergent from each 
other {melanogaster complex vs. D, ananassae = 0,11 1 ± 
0.012; melanogaster complex vs. D, erecta = 0,372 ± 
0.024; D, ananassae vs. D, erecta = 0,3^0 ±0,023; see 
Table S3 in Additional File 2). 

Two processes could be responsible for the phylogen- 
etic incongruence observed in the Bari phylogeny: re- 
cent invasion of the melanogaster complex by a Bari 
sequence from D, ananassae (or vice versa) or the exist- 
ence of an ancestral polymorphism followed by stochas- 
tic inheritance. To distinguish between these two 
possibilities, we estimated Ks, the rate of synonymous 
substitutions per synonymous site, which provide a 
measure of divergence in neutral sites, and the time of 
divergence of the Bari sequences and of the host genes 
(i.e., ADH and GAPDH). If the incongruence in the 
phylogeny is due to differential fixation of Bari variants 
in the common ancestor and is evolving vertically, then 
the Ks values of Bari and of the host genes and their 



time of divergence should be equivalent; however, if the 
estimates for the transposable element sequences are 
significantly lower than those for the host genes, then 
Bari was likely transferred after the species divergence, 
through horizontal transfer or species hybridization. The 
sequences of D, sechellia and D, erecta were not utilized 
in this analysis because they contained large numbers of 
stop codons and small deletions. The Bari sequences 
of D, ananassae showed few premature stop codons, all 
of which were excluded from the alignment, so these 
sequences were used in the Ks estimate. The average Ks 
values of the Bari sequences and host genes were as fol- 
lows: D, melanogaster vs. D, ananassae, Bari Ks = 0.409 ± 
0.049 and host genes /<5 = 0.428 ± 0.058; D, simulans 
vs. D, ananassae, Bari Ks = 0.409 ± 0.047 and host genes 
/<5 = 0.422 ± 0.055 (see Tables S4 and S5 in Additional 
File 2). Using the estimated Ks and 0.011 substitutions 
per site per million years (My) [25] as the rate of syn- 
onymous substitution, the average time of divergence of 
Bari in these species was estimated at 17.68 My, and that 
of the host genes was 19.34 My. During this latter period, 
the lineages that gave rise to D, ananassae and the mela- 
nogaster subgroup were still evolving in Asia [10], sug- 
gesting that the Bari sequences diverged from a common 
sequence in the common ancestor. These estimates 
suggest that the incongruence resulted from stochastic 
retention of the same Baril -like variant by the ancestors 
of D. ananassae and of the melanogaster complex. The 
loss of parts of the TIRs followed, yielding long TIRs in 
D. ananassae and short TIRs in the melanogaster 
complex [23]. 

The ML tree did not allow us to resolve the relation- 
ships among the Baril sequences within the melanoga- 
ster complex {D. melanogaster, D. simulans, D. sechellia) 
because the sequences are very similar and cluster 
within an unresolved branch. We therefore recon- 
structed the sequence relationships using a network tree. 
This approach can resolve relationships among 
sequences with low diversity and can thus be used to 
infer the origin of multiple copies from a unique se- 
quence. In addition, the network reveals relationships 
between ancestral and derived sequences and introduces 
median vectors, which represent ancestral, lost or 
unsampled sequences [26,27]; these relationships cannot 
be inferred from the classical phylogenies. 

The network shows a second phylogenetic incongruence 
in the Baril sequences, revealing a closer relationship be- 
tween the copies of D, simulans and D, melanogaster than 
either has to D, sechellia (Figure IB). The two full-length 
sequences of D, simulans are directly related to a unique 
sequence of D, melanogaster, which is centrally posi- 
tioned on the network and is the sequence from which 
all the other sequences of this species diverged. More- 
over, all sequences of D, melanogaster are very similar 
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(See figure on previous page.) 

Figure 1 Phylogenetic reconstructions conducted using sequences of the DNA transposon Bari in the melanogaster group of 
Drosophila. (A) Pliylogenetic analysis by maximum lil<eliliood and (B) Networl< using tine sequences of tine transposase of tine full-lengtli copies 
obtained from sequenced genomes of tine melonogoster group; (C) Networl< reconstructed using a region of tine transposase sequenced in 
natural populations of the D. melanogaster and D. simulans species. In the network, full circles correspond to the sampled sequences; empty 
circles correspond to median vectors ancestral nodes, which represent lost sequences or sequences not sampled. Circle size corresponds to 
sequence frequency; branch size is proportional to the number of mutations that occurred, as indicated by the numbers above the branches. 

V J 



(/<2p = 0,0020 ± 0,0006), suggesting a recent origin. The 
age of these copies was estimated at ~ 40,000 y {Ks = 0,00086 ± 
0,00084), and the longest time of divergence between the 
Baril sequences of D. melanogaster and that of D, simu- 
lans was estimated at - 196,900 y (/<5 = 0,0004 ± 0,0004; 
average time = 32,800 y; shortest time = 0; see Table S4 in 
Additional File 2), ^QC2i\xsQ^QD, melanogaster zndD, simu- 
lans lineages split from a common ancestor between 2 and 3 
Mya [11,12], the network topology and the divergence time of 
the Baril sequences in both species are inconsistent with the 
species phylogeny and the estimated divergence time. These 
data suggest a very recent transfer of Baril fromD. simulans 
to D. melanogaster and further suggest that following this 
transfer, the sequence remained active and dispersed within 
the D. melanogaster genome, producing the similar copies 
that are observed today. The short branches and the presence 
of several similar copies in the D. melanogaster network, all 
derived from the same sequence, are evidence of transpos- 
ition burst, a common process after the introduction of a new 
element into a naive genome [20], supporting the hypothesis 
of the transfer from D. simulanstoD. melanogaster. 

The similarity of the two D, simulans Baril sequences 
to that of D, melanogaster is high, with the sequences 
differing at only a few sites. To identify whether the pro- 
posed recent transfer is exclusive to the sequenced gen- 
omes, we sequenced a region of the transposase gene in 
different strains of both species and reconstructed their 
relationships (Figure IC), The strains analyzed repre- 
sented natural populations of different geographic ori- 
gins: Africa, the ancestral site; Asia and Europe, 
continents that were first colonized by both species (an- 
cient invaders); and Brazil, where colonization occurred 
relatively recently (recent invaders). The evolutionary re- 
construction shows a sequence shared among all strains. 
This central sequence, which likely corresponds to the 
central sequence depicted in Figure 2B, could be the se- 
quence transferred between the species. The sharing of 
this sequence among strains of different geographic ori- 
gins (Africa, Asia, Europe and Brazil) suggests that 
transfer of Bari occurred before the global dispersal of 
D, melanogaster, 

Retrotransposon 412 

As with Bariy we found different numbers of full-length 
copies (with both LTRs) of the retrotransposon 412 in 



different species of the melanogaster group. The smallest 
numbers were found in D, simulans (2) and D, yakuba 
(2), followed by D, erecta (8), D, sechellia (11), D, ana- 
nassae (14) and D, melanogaster (27; see Table S5 in 
Additional File 3), The ML phylogeny based on the gag 
region shows monophyletic branches clustering with the 
sequences of D, ananassae, D, erecta and D, yakuba 
with high statistical support (Figure 2A and Additional 
file 3: Figure S3), Like the species, the D, ananassae 412 
was the first to diverge. Next, the ancestral element of 
D, yakuba and D, erecta diverged, also mirroring the 
species divergence (Additional File 1); however, in the 
melanogaster complex, the 412 sequences did not form 
monophyletic groups within each species. Given the very 
recent divergence of D, melanogaster, D, simulans and 
D, sechellia (~2 and 3 Mya and 0,5 Mya, respectively 
[11,12,28], the sequences of 412 could not have had time 
to coalesce, potentially explaining the unresolved rela- 
tionships among them. Alternatively, the 412 sequence 
could have been exchanged between these species. 

To resolve the phylogenetic relationships of 412 within 
the melanogaster complex, we again used the network ap- 
proach (Figure 2B), The reconstruction shows long and 
short branches connecting the sequences of D, sechellia, 
suggesting the presence of old and young copies in this 
species. These sequences are connected to two copies of D, 
simulans through median vectors (ancestral or unsampled 
sequences). In D, simulans, only two full-length sequences 
were sampled; these sequences are located in different 
regions of the network. One is related by a long branch to 
the old sequences of D, sechellia through median vectors, 
whereas the other is more closely related to all the 
sequences of D, melanogaster. As shown in Figure 2B, all 
copies of D, melanogaster are directly derived from this se- 
quence of D, simulans, and all have short branches, indi- 
cating a very recent origin. This relationship suggests a 
transfer of 412 from D, simulans to D, melanogaster. 

To confirm this pattern in strains derived from several 
natural populations, we sequenced the integrase region 
of the 412 element in the same strains of D, melanoga- 
ster and D, simulans analyzed for Bari (from Africa, 
Asia, Europe and South America) and reconstructed the 
network (Figure 2C), A similar relationship was 
observed in the strains of both species, where most of 
the D, melanogaster sequences show short branches and 
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(See figure on previous page.) 

Figure 2 Phylogenetic reconstructions conducted using the retrotransposon 472 sequences in the melanogaster group of Drosophila. 

(A) Pliylogenetic analysis by maximum lil<eliliood and (B) Networl< using tine gog sequences of full copies obtained from sequenced genomes of 
the melonogoster group; (C) Network reconstructed using a region of the integrase, sequenced from samples of natural populations of D. 
melanogaster and D. simulans. In the network, full circles correspond to the sampled sequences; empty circles correspond to the ancestral nodes, 
which represent lost sequences or sequences not sampled. Circle size corresponds to sequence frequency; branch size is proportional to the 
number of mutations that occurred, as indicated by the numbers above branches. 

V J 



are derived from only one D. simulans sequence. Two 
sequences of D, melanogaster are excluded from this 
group. They are unresolved and connected by a median 
vector to the D, simulans sequences, as is apparent from 
the reticulation in the network. There are two main 
groups of D, simulans sequences: one with resolved rela- 
tionships and short branches, and the other presenting 
reticulations and several long branches. This scenario 
reflects the ancient and complex evolutionary history of 
412 in this species. Note that the sequence transferred 
to D, melanogaster is shared by all D, simulans strains, 
indicating that it is an active ancestral sequence in this 
species. In addition, all D, melanogaster strains, regard- 
less of origin, share the sequence derived from D, simu- 
lans, which differs by only one point mutation. The 
sharing of ancestral sequences among strains from dif- 
ferent continents suggests that, as with Bari, transfer oc- 
curred before the dispersal out of Africa. 

The age of the proposed transfer was estimated using 
the molecular clock equation (t = Ks /2r). To calculate Ks, 
the 27 full-length sequences of gag + pol ORFs extracted 
from the sequenced D. melanogaster genome were com- 
pared to the ancestral sequence from D. simulans 
(Figure 2B, detail). The oldest age estimated is 146,000 y 
(average = 33,674 y ± 0.0084; lowest = 0.0 y), suggesting 
that the transfer of 412 from D. simulans to D. melano- 
gaster occurred very recently (see Table S7 in Additional 
File 3). Indeed, the ages of the insertions in the D, mela- 
nogaster genome, as calculated by the divergence be- 
tween the LTRs of each copy are 94,697 y (the highest) 
and 0.0 y (the lowest; see Figure S4 in Additional File 3). 
It is known that at least two factors can introduce biases 
into this estimation. First, the LTRs evolve, in general, 
faster than the coding domains of the retrotransposon 
sequence. Second, because the LTRs of the new copy are 
synthesized from only one maternal LTR at the moment 
of reverse transcription, the new LTRs are identical at 
the point of insertion. This process may conceal the ac- 
cumulation of divergence between copies. Despite these 
biases, this approach has been widely used [29,30] and 
provides useful information about the date of insertion 
of each copy. The estimated age indicates that transpos- 
ition would have started soon after the introduction of 
the copy and continued until very recently, congruent 
with empirical data and simulations [7]. The analysis 
shows that all full-length 412 D. melanogaster sequences 



were inserted into the genome no more than 0.1 Mya, 
while in D. simulans, the insertion occurred approxi- 
mately 0.3 Mya. 

Discussion 

The DNA transposon Bari and the retrotransposon 412 
are found widely in Drosophila, suggesting a long evolu- 
tionary history within the genus [23,31,32]. Here, we 
performed phylogenetic analyses involving both trad- 
itional and network approaches that allowed us to reveal 
the occurrence of ancestral polymorphism and recent 
transfer of transposable elements between D, melanoga- 
ster and D, simulans. 

Ancestral polymorphism 

We performed an in silico search for the DNA trans- 
poson Bari in the sequenced genome of species of the 
melanogaster group, and we sequenced a region of the 
transposase in different geographic strains of D, melano- 
gaster and D, simulans. The element exhibits structural 
variations related to its TIRs, which characterize various 
Bari subfamilies [23]. Both long and short TIRs were 
observed in the sequences analyzed in our study. The 
sequences of elements found in D, ananassae and D, 
erecta, which contain long TIRs, included stop codons 
and are therefore inactive, whereas the sequences of D, 
melanogaster, D, simulans and D, sechellia included 
short TIRs. In D, melanogaster and D, simulans, there 
were full-length sequences without stop codons, which 
therefore suggest putatively active copies. In contrast, 
the full-length sequences in D, sechellia included mul- 
tiple stop codons. 

The element found in D, ananassae, with long TIRs, is 
more closely related to the element of D, melanogaster, 
which has short TIRs, than to that of D, erecta, which 
also has long TIRs. These relationships produce incon- 
gruences between the element and species phylogenies. 
The element found in D. erecta {Bari2 subfamily) is 
widely distributed in Drosophila, whereas those in D, 
ananassae (Bar il -like subfamily) and in the melanoga- 
ster complex {Baril subfamily) are restricted to their re- 
spective species; this pattern indicates that the Bari 
element of D. erecta is older than that of D. ananassae 
[23]. Therefore, we propose that the ancestor of the mel- 
anogaster species group possessed at least two Bari var- 
iants that were stochastically inherited by the derived 
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species. The estimated age of the common ancestor of 
the Bari sequences in the genomes of D. ananassae and 
D, melanogaster (~ 17 Mya) is similar to the ages of 
their host genes (~ 19 Mya). During this period, the 
proto-melanogaster lineage was still diversifying from its 
sister subgroups in Africa. The only published estimate 
for the diversification between the melanogaster and 
ananassae subgroups is -44 Mya [25], but it could have 
occurred more recently, as all of the divergence times 
estimated in that study are more than two times higher 
than other estimates (e.g., between the melanogaster and 
montium subgroups, 41.3 Mya [25] and 12.7 Mya [33] 
and between D, melanogaster and D, simulans, 5.4 Mya 
[25], 2.3 Mya [33] and 2-3 Mya [11]). In conclusion, i) 
the phylogenetic incongruence arising from clustering 
the Bari sequences of D. ananassae with those of the 
melanogaster complex and ii) the older estimated age of 
their Bari ancestor with respect to the age of diversifica- 
tion of the melanogaster subgroup and the migration of 
its ancestral lineage to Tropical Africa support the hy- 
pothesis of vertical inheritance with stochastic retention 
of polymorphic sequences of Bari in these species. The 
ancestral polymorphism hypothesis is also supported by 
the smaller distances between the elements of the D. 
ananassae vs. melanogaster complex than between those 
of the D. erecta vs. melanogaster complex. 

As described in the Background section, few reports 
clearly demonstrate retention of ancestral polymorph- 
isms. One such study examines the DNA transposon 
mariner, which occurs in the melanogaster subgroup in 
D. simulans, D, sechellia, D, mauritiana, D, yakuba and 
D, teissieri, but not in D, melanogaster, D, erecta and D, 
orena. It is proposed that mariner was present in the an- 
cestral species prior to the radiation of the melanogaster 
species subgroup and that the element was lost inde- 
pendently in the lineages leading to D, melanogaster and 
D, orena - D, erecta. In addition, the mariner sequences 
of D, simulans and D, mauritiana share active copies, a 
subset of all mariner sequences, that cluster together ra- 
ther than according to the species phylogeny. This 
shared polymorphism in populations of D, simulans 
worldwide and in D, mauritiana indicates retention of 
ancestral polymorphisms [34]. In both the mariner study 
and in ours, the rates of evolution of the DNA trans- 
poson and of a host gene were compared to test the 
ancestral-polymorphism hypothesis and to explain in- 
congruence between the phylogenies of the species and 
the transposable element. 

Transposable element recent invasion 

The incongruence between the species phylogeny and 
the phylogenies of the Bari and 412 elements, along with 
the ages of the sequences shared between D, melanoga- 
ster and D. simulans (less than the age of the species 



divergence), could arise through either horizontal trans- 
fer or introgressive hybridization. An increasing number 
of reports in the last two decades (mostly published fol- 
lowing the rise of large-scale genome sequencing, which 
allows analysis of most copies from a given genome as 
well as broader comparative evolutionary analysis) sug- 
gest that horizontal transfer of transposable elements 
occurs frequently in eukaryotes (for a review see [7]), es- 
pecially in Drosophila (for a review see [8,9]). Particu- 
larly for D, melanogaster and D, simulans, evidence of 
horizontal transfer is accumulating in the literature 
[19,35-42], including the elements Bari [19,37] and 412 
[19,37,42], the focus of this study; however, D, melanoga- 
ster and D. simulans can hybridize, even today, in both 
the laboratory [43] and in nature [44]; therefore, it is 
also possible to introduce transposable elements via 
hybridization. 

In order for either horizontal transfer or introgression 
to occur, species must overlap in time, space and habitat. 
Moreover, for horizontal transfer, a shared potential vec- 
tor (e.g., a virus or endobacterium) is required. Cur- 
rently, the sibling species D. melanogaster and D. simulans 
are cosmopolitan and are sympatric in many parts of the 
world; however, our analyses suggest that transfer events 
occurred before their worldwide expansion but after spe- 
cies divergence. It is thought that D. melanogaster was the 
first species to diverge from a common ancestor in West 
Africa and that the ancestor of D. simulans, the proto- 
simulans lineage, migrated to east Africa and occupied the 
Pacific islands and then diversified. After the divergence, 
D. simulans returned to the mainland and expanded, 
coming into contact with D, melanogaster populations 
in Africa during the Late Pleistocene (around 120 and 9 
thousand years ago) [10]; we estimate that transfer of 
both elements occurred during this period. Overlaps in 
space, time, and most likely niche, may have provided 
the necessary conditions for both horizontal transfer 
and introgression. Regardless of the precise mechanism, 
after the transfers occurred, both species expanded out 
of Africa, D. melanogaster with the Homo sapiens mi- 
gration and D. simulans more recently, most likely dur- 
ing the great navigations [12]. 

The presence of defective fragments of Bari and 412 
elements in the sequenced genomes of both species and 
the presence of two more divergent sequences of the 
retrotransposon 412 in populations of D, melanogaster 
(as shown in the network) indicate that both elements 
were present in the common ancestor of these species 
and were inherited by both species. The presence of 
full-length and putatively active copies in D. melanoga- 
ster, which were derived exclusively from sequences 
transferred from D, simulans, suggests that D. melano- 
gaster either did not inherit active copies of both Bari 
and 412 from a common ancestor or lost these copies 
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early in its diversification. The defective fragments that 
are still present today would then be remnants of the 
copies inherited from its ancestor. After the reintroduce 
tions of Bari and 412, the transferred sequences 
remained active in D. melanogaster, giving rise to the 
majority of copies that are currently present in this spe- 
cies. We show here that the amplification of these cop- 
ies occurred in a short period of time and at elevated 
rates, resulting in a burst of transposition. This process 
can be deduced from the network by the presence of 
identical sequences clustered in nodes or by similar 
sequences connected by short branches; these character- 
istics are observed in both species, in both the 
sequenced genomes and in natural populations. Bursts 
of transposition have been previously reported, in silico, 
for other elements in the genomes of the melanogaster 
subgroup, such as the Helitron DINE-1 in D, yakuba 
and D, ananassae [45] and numerous LTR retrotranspo- 
sons in the D, melanogaster euchromatin [20] . 

The element 412 occurs in two subfamilies in D. 
simulans. Only one is observed in D, melanogaster and 
is very similar to one of the subfamilies found in D. 
simulans [24,30]. Therefore, according to our data, an 
ancestral sequence of the melanogaster complex was 
likely inherited by both species, but diversification oc- 
curred only in D, simulans; later, one of the two sub- 
families was transferred to D, melanogaster, which at 
the time had only defective copies derived from its an- 
cestral lineage. The subfamilies present in D, simulans 
were the result of rearrangements, indels and point 
mutations in the regulatory sequences in the 5' LTR 
-UTR region; then, these changes gave rise to ele- 
ments that were able to overcome the host control for 
transposition and thus able to became invaders [24,30]. 
This process may explain why the retrotransposon 412 
remained active in D, simulans following its divergence 
and retained its capacity for amplification following its 
transfer to D, melanogaster, whose control host system, 
which had coevolved related with the sequences inher- 
ited from the ancestral copies, could not recognize this 
new element. Data from the literature (reviewed in [9]) 
suggest that several elements have been independently 
transferred between the two species over time (e.g., 
Copia, tirant, Opus, Gypsy 2, Gypsy 5, Gypsy 6, 297, 
17,6), but several others may have been transferred 
simultaneously and very recently (e.g., 412, Blood, 
Stalker 2, Transpac, Flea), as can be deduced from the 
very similar ages of the transfers. There is a consensus 
that multiple elements have recently arrived in D, mel- 
anogaster; however, their origins either have not been 
suggested [20,23] or were not clearly demonstrated 
[38,46]. Utilization of the network approach allowed us 
to propose, at least for Bari and 412, that D, simulans 
was the donor species. 



Conclusions 

The results obtained here allowed us to propose that the 
incongruences observed in the phylogeny of the Bari and 
412 elements were a result of ancestral polymorphism as 
well as recent invasion of D, melanogaster genome by these 
elements. The ancestral polymorphism associated with 
Bari is supported by phylogenetic incongruence, and by a 
divergence time of Bari between the D, melanogaster com- 
plex and D, ananassae similar to that of the host genes. 
The hypothesis of recent invasion of both elements is sup- 
ported by phylogenetic incongruences revealed by network 
trees; in addition, the shortest time of divergence is found 
between the transposable sequences, rather than between 
the species involved. Moreover, D, simulans is thought to 
have transferred sequences of both elements to D, melano- 
gaster. This species in turn would not have inherited or 
would have lost the active copy that existed in its ancestor 
as soon as it diverged, and all of its full-length sequences 
would have been derived from the sequence that was 
transferred from its sibling species. This introduction 
would have occurred in Africa before the worldwide ex- 
pansion of the species, most likely in the late Pleistocene, 
during which D, melanogaster and D, simulans returned to 
sympatry in Africa after diversification in allopatry. In D, 
melanogaster, the elements would be passed through a 
burst of transposition, producing a high number of full- 
length copies over several thousand years. 

Methods 

/n s'lWco analyses 

The search for copies of the retrotransposon 412 and of 
the DNA transposon Bari in the sequenced genomes of 
species of Drosophila melanogaster group (release 5.18 
of D, melanogaster and 1.3 for all other species [16]) was 
performed using BLASTn [47]. The deposited sequences 
(Repbase databank [48]) described in D, melanogaster 
were used directly to search in this species. The Bari se- 
quence used [GenBank: X67681] is 1,728 bp long and 
encodes a 340 aa transposase. The 412 sequence [Gen- 
Bank: X04132] is 8,039 bp long and encodes a 452 aa 
gag-\\ke: protein and a 1,001 aa j^oZ-like polyprotein (Re- 
verse transcriptase, RNase H and Intregrase, respect- 
ively). For the other species, the deposited sequences 
were used to identify the reference sequence, i.e., the 
most complete and conserved sequence of each element 
in each species. These sequences were used to search for 
complete and incomplete copies in each genome. The 
complete copies were tested for the presence and integ- 
rity of the transposase gene {Bari) and gag and pol {412) 
using the software ORF Finder [49]. In the search, the 
hits with the smallest e-values (> 10"^) and highest RM 
scores (> 225) were selected. Alignment, reconstruction 
of the phylogeny by maximum likelihood (ML), calcula- 
tion of the rate of synonymous substitutions per 
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Table 1 Strains of D. melanogaster and D. simulans used in this study 



Species 



Classification 
with regard to 
the origin 



Geographic Origin 



Collector/ 
Stock 



GenBank Accessions 



Bari-1 



412 



D. (nQlonoQQStQf 


Ancestral 


Madagascar — Africa 


David, J 


JXl 401 91 -JXl 40203 


JXl 40342-JX 140346 






Congo - Africa 


14021-0231.24 


JX140204-JX1 40220 


JXl 40347-JXl 40352 




Ancient Invader 


Draveil - France 


David, J 


JXl 40221 -JXl 40237 


JXl 40353-JXl 40361 






Delhi - Asia 


David, J 


JXl 40238-JXl 40256 


JXl 40362-JX 140368 




Recent Invader 


Florianopolis - Brazil 


Granzotto, A 


JXl 40257-JXl 40275 


JXl 40369-JXl 40375 


D. simulans 


Ancestral 


Madagascar - Africa 


David, J 


JXl 40276-JXl 40288 


JXl 40376-JXl 40389 






Zimbabwe - Africa 


Begun, D 


JXl 40289-JX 140299 


JX140390-JX140399 




Ancient Invader 


Draveil - France 


Capy, P 


JXl 40300-JXl 40307 


JX140400-JX140411 






Gorak - New Guinea 


14021-0251.009 


JX140308-JX140318 


JX140412-JX140420 




Recent Invader 


Florianopolis - Brazil 


Granzotto, A 


JX140319-JX140332 


JXl 40421 -JXl 40435 






Pernambuco - Brazil 


Rohde, C 


JXl 40333-JXl 40341 


JX140436-JX 140445 



Classification with the regard to geographic origin of the strains, name of the collectors or stock numbers and GenBank sequence accession numbers. 



synonymous site {Ks) and analysis of the Kimura 2- 
parameters distance {K2p) [50] were performed only for 
full-length sequences with both TIRs and LTRs, using 
the package MegaS [51]. The evolutionary relationships 
between sequences were also reconstructed using the 
package Network [52]. The ages of the transposable ele- 
ments were estimated using the molecular clock equa- 
tion r = k/2 T, where r is the rate of neutral synonymous 
substitution in the genus Drosophila (r = 0.011/site/ 
million years [25] and k is the rate of divergence in the 
synonymous sites {Ks), The molecular-clock hypothesis 
assumes that when genes from different species are 
compared, the number of nucleotide changes is propor- 
tional to the speciation time. We then estimated the di- 
vergence time between species of the melanogaster 
group using the Ks values of the CDS of two host genes 
(ADH: Alcohol dehydrogenase and GAPDH: Glyceralde- 
hyde 3 phosphate dehydrogenase 1; see Table 5S in 
Additional File 2) and of the Bari transposase. The ages 
of the insertions of the retrotransposon 412 were esti- 
mated using the date of divergence between both LTRs 
of each copy by K2p in the molecular clock equation. 

Molecular analyses 

The phylogenetic relationships between the Bari 
sequences of strains of D, melanogaster and D, simulans 
of different geographic origin (Table 1) were also recon- 
structed. These strains were classified as ancestors 
(sampled in Africa) or invaders (ancient, sampled in Asia 
and Europe; or recent, sampled in Brazil) according to 
place of origin and literature reports [10]. 

Genomic DNA was extracted from 50 individuals [53]. 
DNA concentration and integrity was analyzed via spec- 
trophotometer (NanoDrop). Amplification (PGR) was 
performed using specific primers that anneals to nucleo- 
tides 412 to 1,133 (total length of 722 bp) in the Bari 



transposase gene {BariJ 5' CGG GCT GGT ATT GTT 
GCT AGG TTT 3' and BariJR 5' ATC CTA CCC TTA 
TGG CAT GGA GCA 3') and to nucleotides 5,622 - to 
6,499 (total length of 878 bp) in the 412 integrase gene 
{412_¥ 5' TGG SCR AGG TCA WAR GAC AT 3' and 
412_R 5' RCT TTS TAT STT ATA GGG CC 3'), 0.625 unit 
of Taq polymerase (Invitrogen), 200 ng genomic DNA, 
1 mM of MgCl2, 1 X buffer, 0.08 mM of dNTPs and 
0.4 mM of each primer, for a final volume of 25 (iL. PGR 
conditions were as follows: initial denaturation (94°C, 
120 s); followed by 30 cycles of denaturation (94°C, 45 s), 
annealing (69°C for Bari and 59°C for 472, 45 s) and exten- 
sion (72°C, 60 s). Each PGR product was analyzed by gel 
electrophoresis on a 1.0% agarose gel, purified (DNA GFX 
DNA & Gel Band, GE) and cloned (TOPO TA Cloning kit, 
Invitrogen) according to the manufacturers specifications. 
Approximately 30 {D. melanogaster) and 20 {D, simulans) 
clones were selected for plasmid extraction by phenol/ 
chloroform protocol and sequenced using the universal 
primers M13F and M13R. The sequences produced were 
deposited in the GenBank database (Table 1). 

Additional files 



Additional Fife 1: Figure SI: the phylogenetic relationships 
between species of the melanogaster group of Drosophila. 

Additional File 2: Tables and figure about the characteristics and 
evolutionary analyses of the DNA transposon Bari sequences found 
in the sequenced genomes of species of the melanogaster group of 
Drosophila. Table corresponding to the Ks analyses of the genes ADH 
and GAPDH. 

Additional File 3: Tables and figures about the characteristics and 
evolutionary analyses of the sequences of the retrotransposon 472 
found in the sequenced genomes of species of the melanogaster 
group of Drosophila. 
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